Approximate inference with sampling

Dave Kleinschmidt

library(tidyverse)
library(purrrlyr)

## install.packages("devtools")
## devtools::install_github("kleinschmidt/beliefupdatr", agrs="--preclean")
library(beliefupdatr)

What does it mean to sample from a distribution?

Updating beliefs

Quantifying uncertainty

  • We have two categories /b/ and /p/.
  • Realized as normal distributions on an acoustic cue \[ p(\mathrm{VOT} | \mu, \sigma^2) \]
  • We don’t know the mean \(\mu\) and variance \(\sigma^2\).
  • Express our uncertainty as a probability distribution over the mean and variance: \[p(\mu, \sigma^2)\]
  • This distribution assigns a degree of belief for each particular combination of mean \(\mu\) and variance \(\sigma^2\).

Learning from experience

  • How do we update our beliefs based on experience?
  • Conceptually, Bayes Rule: \[ p(\mu, \sigma^2 | x) \propto p(\mathrm{VOT}=x | \mu, \sigma^2) p(\mu, \sigma^2) \]
  • Degree of belief assigned to each \(\mu,\sigma^2\) after observing \(x\) is the product of the prior belief and how well \(x\) is predicted.

How??

updating rules

Why, god

Why, god.

Why, god..

Enough

  • Working with the distribution directly is hard.
  • Neither researchers nor brains want to do a lot of algebra.
  • What if there was a better way?!
  • Replace continuous distribution \(p(\mu, \sigma^2)\) with samples of plausible hypotheses.
  • Re-weight samples based on how well they predict the data

One sample of prior \(p(\mu,\sigma^2)\)

Many samples approximate \(p(\mu,\sigma^2)\)

Many samples approximate \(p(\mu,\sigma^2)\)

Weighting samples by importance

  • How do you update samples to reflect new information?
  • Notation: for each category, there are \(K\) samples of \((\mu_k, \sigma^2_k)\), where \(k = 1 \ldots K\).
  • Samples are all equally representative of prior, so have the same initial weight: \(w^k_0 = 1/K\).
  • Re-weight samples based on likelihood of data given that sample (how well hypothesis predicts data): \[ w^k_n = w^k_0 p(x_1, \ldots, x_n | \mu_k, \sigma^2_k) \]

Doing things under uncertainty

  • Even if we don’t know exactly the means and variances, we still want to be able to categorize things.
  • If we know the categories’ means and variances, this is straightforward: \[ p(c = b | x) = \frac{p(x | \mu_b, \sigma^2_b)p(c=b)}{p(x | \mu_b, \sigma^2_b)p(c=b) + p(x | \mu_p, \sigma^2_p)p(c=p)} \]
  • But we don’t know the means and variances!
  • Taking uncertainty about \(\mu, \sigma^2\) into account requires averaging over plausible values (“marginalizing” in Bayesian jargon).

\[ \begin{align} p(c=\mathrm{b} | x) = \int \cdots \int & d\mu_b d\mu_p d\sigma^2_b d\sigma^2_p \\ & \frac{p(x | \mu_b, \sigma^2_p) p(c=b)}{p(x | \mu_b, \sigma^2_p) p(c=b) + p(x | \mu_p, \sigma^2_p) p(c=p)} \\ & p(\mu_b, \mu_p, \sigma^2_b, \sigma^2_p) \end{align} \]

Doing things under uncertainty…with samples!

  • An integral is really a weighted average!
  • So we can calculate the category boundary for each sample, and average them together.

Flexibility of sampling

What if category isn’t known?

Weird priors

  • Analytical inference relies on using a conjugate prior which can be very restrictive.
  • For normal distribution, means that uncertianty about mean depends on the category variance.
  • What if we’re very confident about the variance \(\sigma_p^2\) but not the mean \(\mu_p\)?
  • There are no analytical updating rules for this, but we can sample from the prior!